A PLSA-based language model for conversational telephone speech
نویسندگان
چکیده
This paper describes experimentswith a PLSA-based language model for conversational telephone speech. This model uses a long-range history and exploits topic information in the test text to adjust probabilities of test words. The PLSA-based model was found to lower test set perplexity over a traditional word+class-based -gram by 13% (optimistic estimate using a reference transcript as history) or by 6% (realistic estimate using recognised transcript as history). Moreover, this paper introduces a use of confidence scores to weight words in the history, a weight of the prior topic distribution and a way of calculating perplexity that accounts for recognition errors in the model context.
منابع مشابه
A New Bigram-PLSA Language Model for Speech Recognition
A novel method for combining bigram model and Probabilistic Latent Semantic Analysis (PLSA) is introduced for language modeling. The motivation behind this idea is the relaxation of the “bag of words” assumption fundamentally present in latent topic models including the PLSA model. An EM-based parameter estimation technique for the proposed model is presented in this paper. Previous attempts to...
متن کاملImproving English Conversational Telephone Speech Recognition
The goal of this work is to build a state-of-the-art English conversational telephone speech recognition system. We investigated several techniques to improve acoustic modeling, namely speaker-dependent bottleneck features, deep Bidirectional Long Short-Term Memory (BLSTM) recurrent neural networks, data augmentation and score fusion of DNN and BLSTM models. Training set consisted of the 300 ho...
متن کاملRecognizing Call-center Speech Using Models Trained from Other Domains
In this paper, we introduce a new conversational speech task – recognizing call-center speech – using data collected from Dragon’s own technical support line. We compare performance of models trained from conversational telephone speech (the Switchboard corpus) and models trained from predominantly read, microphone speech, and report on a series of experiments focusing on adapting the microphon...
متن کاملExperiments for an approach to language identification with conversational telephone speech
This paper presents our recent work on language identi-cation research using conversational speech (the LDC Conversational Telephone Speech Database). The base-line system used in this study was developed recently ((4, 5]). It is based on language-dependent phone recognition and phonotactic constraints. The system was trained using monologue data and obtained an error rate of around 9% on a com...
متن کاملUsing Continuous Space Language Models for Conversational Speech Recognition
Language modeling for conversational speech suffers from the limited amount of available adequate training data. This paper describes a new approach that performs the estimation of the language model probabilities in a continuous space, allowing by these means smooth interpolation of unobserved n-grams. This continuous space language model is used during the last decoding pass of a state-of-the...
متن کامل